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Convex Fused Lasso Denoising with Non-Convex 
Regularization and its use for Pulse Detection 

Ankit Parekh and Ivan W. Selesnick 


Abstract —We propose a convex formulation of the fused lasso 
signal approximation problem consisting of non-convex penalty 
functions. The fused lasso signal model aims to estimate a sparse 
piecewise constant signal from a noisy observation. Originally, the 
1 1 norm was used as a sparsity-inducing convex penalty function 
for the fused lasso signal approximation problem. However, the 
norm underestimates signal values. Non-convex sparsity-inducing 
penalty functions better estimate signal values. In this paper, 
we show how to ensure the convexity of the fused lasso signal 
approximation problem with non-convex penalty functions. We 
further derive a computationally efficient algorithm using the 
majorization-minimization technique. We apply the proposed 
fused lasso method for the detection of pulses. 

Index Terms —Sparse signal, total variation denoising, fused 
lasso, non-convex regularization, pulse detection. 


I. Introduction 

We consider the problem of estimating a sparse piecewise 
constant signal x from its noisy observation y, i.e., 

y = xAw, (1) 


where w represents zero-mean additive white Gaussian noise. 
Estimation of sparse piecewise continuous signals arise in tran¬ 
sient removal [28], genomic hybridization [24], [32], signal 
and image denoising [1], [9], prostate cancer analysis [31], 
sparse trend hltering [13], [23], [33], [34] and biophysics [15]. 
In order to estimate sparse piecewise constant signals, it has 
been proposed [31] to solve the following sparse-regularized 
optimization problem 


arg mm 


iJ: 


12 


\\y-x\\l- 




( 2 ) 


where Aq > 0 and Ai > 0 are the regularization parameters 
and the matrix D is dehned as 


-1 1 


D = 


£) g (3) 


-1 1 


The optimization problem (2) is well-known as the £i fused 
lasso signal approximation (ELSA) problem [31]. The ELSA 
problem has been explored to aid the diagnosis of Alzheimer’s 
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disease [36] and background subtraction problems [37], how¬ 
ever with a different data-fidelity term. Note that when An = 0, 
problem (2) reduces to the total variation denoising (TVD) 
problem [26]. 

It is known that the norm, when used as a sparsity- 
inducing regularizer, underestimates the signal values. The 

norm is generally not the tightest convex envelope for 
sparsity [12]. In order to better estimate signal values, non- 
convex penalty functions are often favored over the £i norm 
[2], [4], [19], [22], [25], [35]. However, the use of non-convex 
penalty functions generally leads to non-convex optimization, 
which suffer from several issues (spurious local minima, 
initialization, convergence, etc.). 

In this paper, we propose to estimate sparse piecewise 
constant signals via the following convex non-convex (CNC) 
ELSA problem 

r 1 ^ 

argmmj F{x) = -\\y - x\\l +Xo^(j){xn\ao) 

^ n—1 

N-1 

-I- Ai ^ (l){[Dx\n\ai) 

n—1 

where R —^ R with a ^ 0 is a non-convex sparsity- 
inducing regularizer. Specifically, we propose that the regu¬ 
larization terms be chosen so that the objective function F in 
(4) is convex. As a result, the CNC ELSA approach avoids 
the drawbacks of non-convex optimization. The non-convex 
penalty function (f) aims to induce sparsity more strongly than 
the norm and thus better estimate the signal values. The 
parameter a controls the degree of non-convexity of </>; a higher 
value of a indicates a higher degree of non-convexity for (j). 
As a main result, we state and prove a condition that qq and 
tti must satisfy to ensure the objective function F in (4) is 
strictly convex. As a consequence of the convexity condition, 
well-known convex optimization techniques can be used to 
reliably obtain the global minimum of the objective function 
F. As a second main result, we provide an efficient fast con¬ 
verging algorithm for the proposed CNC ELSA problem (4). 
The algorithm is derived using the majorization-minimization 
(MM) procedure [8]. 

The idea of formulating a convex problem with non-convex 
regularization was described by Blake and Zisserman [3], and 
Nikolova [17], [18]. The idea is to balance the positive second- 
derivatives of the data-fidelity term with the negative second- 
derivatives of the non-convex penalty function. This approach 
has been successfully applied to various signal processing 
applications (e.g., [6], [14], [21], [29] and the references 
therein). Using this technique, a modihed formulation of the 
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li FLSA problem (2) was proposed with an aim to induce 
sparsity more strongly than the norm [1]. Further, a two-step 
procedure was used to obtain the solution to the modified fused 
lasso problem. However, the modified fused lasso (MDFL) 
problem [1] considers only the first of the two regularization 
terms as non-convex. 

This paper is organized as follows. In Section II we describe 
the class of non-convex penalty functions. In Section III we 
provide the convexity condition for the objective function F in 
(4). We derive an algorithm to solve the CNC FLSA problem 
(4) based on the MM procedure in Section IV. In Section V 
we apply the proposed CNC FLSA approach to the problem of 
detecting ECG pulses in strong additive white Gaussian noise 
(AWGN). 


11. Preliminaries 

A. Notation 

We denote vectors and matrices by lower and upper case 
letters respectively. The A^-point signal y is represented by the 
vector 


1) ^ is continuous on M, twice differentiable on M\{0} and 

symmetric, i.e., a) = a) 

2) (j)'{x) > 0,Va; > 0 

3) ^"(x) < 0,Vx > 0 

4) ,^'(0+) = 1 

5) inf a) = (/'"(0+; a) = —a 

x^O 

An example of a penalty function, which satisfies Assump¬ 
tion 1, is the logarithmic penalty function [4] defined as 


^(a;;a) 


- log(l -I- a|a:|), a > 0 
|a;|, 0 = 0. 


( 10 ) 


Note that when o = 0, this penalty function reduces to the 
£i norm. Other examples of non-convex penalty functions 
satisfying Assumption 1 include the arctangent and the rational 
penalty functions [11], [27]. Note that the £p norm does not 
satisfy Assumption 1. 

We note the following lemma, which we will use to obtain 
a convexity condition for optimization problem (4). 

Lemma 2: [21] Let R —>■ R satisfy Assumption 1. The 

function s: R —R defined as 


y = [yo,---,yN-ir, y g (5) 

where represents the transpose. The £i and £2 norms of 
the vector y are defined as 

n \ n 

The soft-threshold function [7] for A > 0, A G R is defined as 

{ X -b A, X < —A 

0, —A ^ X < A (7) 

X — A, X > A. 

For X G R^, the notation soft(x; A) implies that the soft- 
threshold function is applied element-wise to x with a thresh¬ 
old of A. 

Definition 1: The total variation denoising (TVD) problem 
[26] is defined as 

||t/-x||^-bA||Dx||i|, (8) 

where A > 0 is the regularization parameter. 

We note the following lemma, which provides an efficient 
two-step solution to the £\ FLSA problem (2). 

Lemma 1: [9, Lemma A.l] The solution x* to the £\ 

FLSA problem (2) is given by 

X* = soft(tvd(y, Ai), Ao). (9) 


tvd(j/; A) = argmin 



B. Non-convex penalty functions 

We propose to use parameterized non-convex penalty func¬ 
tions, with a view to induce sparsity more strongly than the 
£i norm. We assume such non-convex penalty functions have 
the following properties. 

Assumption 1: The penalty function (/): R —>■ R satisfies 
the following 


s(x; a) = (/)(x; a) — |x|, (11) 

is twice continuously differentiable and concave with 

—a ^ s''{x;a) ^ 0. (12) 


III. Convexity condition 
In this section, we seek to find a condition on the parameters 
ao and oi to ensure that the objective function F in (4) is 
strictly convex. The following theorem provides the required 
condition on ao and ai. 

Theorem 1: Let R — >■ R be a non-convex penalty 
function satisfying Assumption 1. The function F: R-^ —>■ R 
defined as 

^ N N-1 

= 2 lly “ 2 :|l 2 + Ao X! <l>ixn;ao) + Ai ^ ())([L>x]„; oi), 

n—1 n—1 

(13) 

is strictly convex if 


0 ^ agAo “t“ 4aiAi ^ 1. 
Proof: Consider the function G : R^ - 


(14) 

defined as 


N 


N-1 


G{x) = ^\\y- x\\l + Ao ^ s(x„; og) + Ai ^ s([L>x]„; ai), 

(15) 


n—1 


n—1 


where s(x;a) = fix',a) — |x|. From Lemma 2, the function 
G is twice continuously differentiable and its Hessian can be 
written as 


V^G = IF Aor(x; og) + XiD^TiDx', afjD, (16) 


where 


r(x; a) 


s"(xi;a) 


s"(x„;a) 


(17) 
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(a) G{x),ao = 1/2,ai = 1/8 



(b) G(2:),ao = 1/2,01 = 1/3 




Fig. 1. Surface plots illustrating the convexity condition, (a) The function 
G{x) (15) is convex for ao = 1/2 and ai = 1/8. (b) The function G{x) is 
not convex for ao = 1/2 and ai = 1/3 (these values violate Theorem 1). 


Fig. 2. Region of convexity for the function F in (13). The function F is 
strictly convex for any values of ao and ai inside the triangular region. 


For the strict convexity of G, we need to ensure that V^G is 
positive definite. To this end, from the assumptions on (f), it 


follows that 

r(x; oo) —ao-f, (18) 

Moreover, we can write 

D'^T{Dx;ai)D -aiD'^D (19) 

^ -4ai/. (20) 

The inequality (20) is obtained using the eigenvalues' of the 
matrix D^D. Using (16), (18) and (20), V^G ^ 0 if 

(1 — ooAo — 4aiAi)/ :?= 0, (21) 

or equivalently if, 

1 — UqAq — 4lCIiXi ^ 0. (22) 

From (11), (13) and (15) it is straighforward that 

F{x) = G{x) + Ao||a;||i + Ai||aa;||i. (23) 


Hence, F in (13) is strictly convex as long as the inequality 
(22) holds true (the function F is a sum of a strictly convex 
function G, the convex £i norm, and the convex TV penalty). 

□ 

The following example illustrates the convexity condition 
(22) for N = 2. Let Aq = Ai = 1 and y = 0. As per Theorem 
1, the function G (by extension the function F) is strictly 
convex if ao + 4ai ^ 1. Figure 1(a) shows the function G 
with the values oq = 1/2 and ai = 1/8. These values satisfy 
Theorem 1 and as a result the function G is strictly convex. 
On the other hand. Fig. 1(b) shows the function G when oq = 
1/2 and oi = 1/3. These values of oq and ai violate the 
Theorem 1; consequently the function G is non-convex as seen 
in Fig. 1(b). 

The convexity condition given by Theorem 1 in (14) implies 
that the values of oq and oi must lie on or below the line given 
by ooAi +4aiAi = 1. Figure 2 displays the values of oq and 
oi for which the function F is strictly convex. In order to 
maximally induce sparsity, we choose the values of oq and 
oi on the line. Specifically, we propose to select a value of 

*The eigenvalues of D are given by {2 — 2 cos(A:7r/7V)} for k = 
0, [30], 


ao G (0,1/A) and set the value of ai as 


at 


1 — uoAo 
4Ai 


(24) 


IV. Optimization Algorithm 

Due to Theorem 1, we can reliably obtain via convex 
optimization the global minimum of (4) as long as the pa¬ 
rameters ao and ai are chosen to satisfy (14). We derive an 
algorithm for the proposed CNC fused lasso method using the 
majorization-minimization (MM) procedure [8], such that 

= argminF^(a;,a;'=), (25) 

X 

where F^ denotes a majorizer of the function F in (4), and 
where k is the iteration index. The MM procedure guarantees 
that each iteration monotonically decreases the value of the 
objective function F in (4). We use the absolute value function 
and a linear function to majorize the non-convex penalty 
function. With this particular choice of majorizer, each MM 
update iteration involves solving the £i FLSA problem (2). 

To derive a majorizer of the function /), note that (j>{x; a) = 
s{x; a) + |x|. As a result, it suffices to majorize the function s 
with a linear term in order to obtain a majorizer of the function 
<j). Observe that since s is a concave function, the tangent line 
to s at a point v always lies above the function s. Using the 
tangent line to the function s, a majorizer of the function (p is 
given by : R X R —>• R, defined as 

4>^{x, V] a) = \x\ + s'{v, a){x — u) -I- s(u; a), (26) 

for cc, u G M. It follows straightforwardly that 

(j)^ {x,v,a) 4>{x\a), VxjVGR, (27) 

(j)^{v,v,a)=4i{v,a), Vv G R. (28) 

Figure 3(a) shows the absolute value function \x\. The 
twice continuously differentiable function s(x; a) is shown in 
Fig. 3(b), along with the tangent line to s(a;;a) at a; = 1. 
Figure 3(c) shows the non-convex penalty function (p and its 
majorizer p^{x, v; s) given by (26). The majorizer is the sum 
of the absolute value function in Fig. 3(a) and the tangent line 
to s{x; a) in Fig. 3(b). 

Using (27) and (28), we note that 

y/ {xn, Vn]a) Ss p{xn] o) , (29) 
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(a) 



X 


Fig. 3. (a) The absolute value function |x|. (b) The twice continuously 

differentiable function s(x;a) and the tangent line at x = 1. (c) The non- 
convex penalty function ^ and its majorizer a) given in (26). 


{[Dx]n, [Dv]n; a) > X! 4>{[Dx]n; a), (30) 

n n 

with equality if x = v. Further, note that 

= llxlli + s'{v; a)'^(x - v) + Ci, (31) 

n 

where s'(v] a) is the vector defined as [s'(v; a)]„ = s'{vn] a), 
i.e., the derivative of the function s is applied element-wise to 
the vector v. Further, note that Ci is a constant that does not 
depend on x. Similarly, we write 

'^(j)^{[Dx]n, [Dv]n;a) 

n 

= ll-Dxlli -I- s'(Dv; a)^ D{x — t;) -I- C 2 , (32) 

where C 2 is a constant that does not depend on x. Therefore, 
using (31) and (32), a majorizer of the objective function F 
in (4) is given by x ^ R, defined as 

F'^{x,v) = ]^\\y-x\\l + Aollxlli +\Qs'{v]af{x-v) 

+ Ai||Z?a;|li + Ais'(-Dt;; af’"D{x — v) + C, 

(33) 

where C is a constant that does not depend on x. Completing 


the square, we write (33) as 

F^(x,v) = i||y(f) -x\\l -b Aollxlli + AillFxlli +C, 

(34) 

where 

yiy) = y — Xos'{v; a) — XiD"^ s'(^Dv; a). (35) 

Therefore, each MM iteration consists of minimizing the 
function F^ (34), which is the ii FLSA problem (2) with 
y{v) as the input. Consequently, using (9) the MM update 
(25) can be written as 

y^ = y — Xos'{x^; a) — XiD^s'(^Dx’^]a) (36a) 

= soft(tvd(y^, Ai), Ao). (36b) 

Equation (36) constitutes a fast converging, computationally 
efficient algorithm to solve the proposed CNC FLSA problem 
(4). To implement (36b), we use the fast (finite-time) exact 
TV denoising algorithm based on the ‘taut-string’ method [5], 
which has a worst case complexity of 0{n). We initialize the 
iteration with the solution (9) to the FLSA problem (2). 
Note that the MM update (36) does not consist of any matrix 
inverses. 

V. Examples 

We consider the problem of estimating pulses of varying 
width in the presence of high additive white Gaussian noise 
(AWGN). We model the pulse signal as sparse piecewise 
constant and apply the proposed CNC ELSA problem (4) to 
estimate the individual pulses. We set the value of Ai as in 
[29], i.e., Ai = j3'/Na where /3 is a constant (usually 1/4) 
and cr represents the standard deviation of the additive white 
Gaussian noise. We manually set the value of Aq to obtain the 
lowest RMSE. 

Eigure 4(a) illustrates the synthetic clean pulse signal and 
Eig. 4(b) shows the noisy pulse signal. Shown in Eig. 4(c) 
are the estimates obtained using the standard £i norm and the 
non-convex atan penalty function [19, equation (23)]. It can 
be seen that the proposed CNC ELSA method estimates the 
pulses more accurately than the £i ELSA method. The relative 
performance of the CNC fused lasso in estimating pulses is 
also highlighted by the denoising error shown in Eig. 4(d). 

The value of the objective function F in (4), after each 
iteration of the MM algorithm (36), is shown in Eig. 5. The 
MM algorithm derived in [28, Table II] for a more general 
problem can also be used to solve the CNC ELSA problem 
(4). However, the MM algorithm in [28] utilizes a quadratic 
majorizer for the non-convex penalty function (j). Eigure 5 
shows that the proposed MM algorithm (36) converges much 
faster than the MM algorithm in [28]. The proposed MM 
algorithm (36) converges in about 5 iterations. 

In order to assess the relative performance of the proposed 
CNC fused-lasso method, we use 15 realizations of the noisy 
synthetic pulse signal in Eig. 4 and denoise them using both 
the original f'l fused lasso and the proposed CNC fused lasso 
methods. We also compare with the modified fused lasso 
(MDEL) [1], which is a special case of (4) with ai = 0; i.e.. 
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a 

Fig. 6. Average RMSE as a function of cr. The proposed CNC FLSA yields 
the lowest RMSE across different values of the noise variance (cr^). 



0 0.5 1 1.5 2 


(d) Denoising Error 



Time (n) 


Fig. 4. Denoising a synthetic signal using CNC fused lasso (4) and fused 
lasso (2). 



0 5 10 15 20 

Iteration (k) 

Fig. 5. The value of the objective function F in (4) at each iteration of the 
MM algorithm using quadratic and proposed majorizers. The proposed MM 
algorithm (36) converges within 5 iterations. 


only the first regularization term is non-convex. It can be seen 
in Fig. 6 that the proposed CNC FLSA (4) approach offers the 
lowest RMSE values across different noise levels. Further, for 
the test signal in Fig. 4(a), the average RMSE as a function 
of oq is shown in Eig. 7. Note that oi is set according to (24). 

As an another example, we consider the problem of de¬ 
tecting the QRS peaks in an ECG signal in AWGN with 


Fig. 7. Average RMSE as a function of ao for the synthetic test signal 
example in Fig. 4. 



0 3 6 9 12 

Time (s) 


Fig. 8. Denoising of ECG signal in strong AWGN. The £i FLSA under¬ 
estimates the signal values. The Pan-Tompkins detects several false-positive 
R-waves. 


high variance (a^). Wearable heart-rate monitors suffer from 
strong noise due to abrupt motion artifacts. Several methods 
for detecting the QRS peaks in ECG signals were studied in 
[10], [20]. We evaluate the detection of ECG R-waves in strong 
AWGN using the proposed CNC fused lasso method. Using a 
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sampling frequency of 256 Hz, we simulate the ECG signal 
using the synthetic ECG waveform generator, ECGSYN [16]. 
The clean and noisy ECG (cr = 0.4) are shown in Eig. 8. 
We set the parameters Aq = 0.6, Ai = 0.9, uq = 0.9/Ao and 
ai = 0.1/(4Ai). We use 20 iterations for the proposed CNC 
ELSA algorithm (36). 

Eigure 8 illustrates the denoised ECG signal using the £i 
ELSA and the proposed CNC ELSA methods. It can be seen 
that the £i ELSA does not detect all the R-waves. Moreover, 
the amplitudes of the pulses detected using the proposed CNC 
ELSA (4) are relatively high compared to those detected using 
the £i ELSA. The £\ norm tends to underestimate signal 
values. Also shown in Eig. 8 are the R-waves detected using 
the Pan-Tompkins real-time QRS detector [20]. Note that the 
Pan-Tompkins detector was not designed for ECG signals with 
high-noise variance. As a result, the Pan-Tompkins algorithm 
detects several false-positive R-waves. 

VI. Conclusion 

The fused lasso signal approximation (ELSA) problem aims 
to estimate sparse piecewise constant signals. In order to 
improve the accuracy of the ELSA approach, we use non- 
convex penalty functions as sparsity-inducing regularizers. In 
this paper we generalize the results of [1], which addresses the 
case wherein only one of the two regularization terms is non- 
convex. We prove that the proposed ELSA objective function 
is convex when the non-convex penalty parameters are suitably 
set. We also derive a computationally efficient algorithm using 
the majorization-minimization technique. The proposed CNC 
ELSA algorithm does not consist of any calculations involving 
a matrix inverse. We apply the proposed method to the problem 
of pulse detection under high additive white Gaussian noise. 
An illustration is provided for the detection of R-waves in an 
ECG signal. 
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